nyc_medicare_2017 <- medicare_part_b_2017 %>%
  filter(`Provider Zip Code` %in% nyc_zip_codes) %>%
  mutate(zip_3 = substr(`Provider Zip Code`, 1, 3),
         borough = ifelse(zip_3 == 104, "Bronx", ifelse(zip_3 == 112, "Brooklyn", 
                                                        ifelse(zip_3 == 100, "Manhattan", 
                                                               ifelse(zip_3 == 103, "Staten Island", "Queens")))),
         health_plus = ifelse(`Provider Name` %in% nyc_health_plus, "Y", "N"))
nyc_hospitals <- nyc_medicare_2017$`Provider Name` %>%
  unique
nyc_hospitals_borough <- nyc_medicare_2017 %>%
  group_by(borough) %>%
  summarise(number_hospitals = n_distinct(`Provider Name`))
datatable(nyc_hospitals_borough)

Okay so if we look at this nyc_hospitals_borough, we clearly see that there is a geographical imbalance in terms of the number of hospitals per borough. Queens is probably the most underserved borough in this regard since it is also the biggest borough by land area, right?

We also have another problem. How do you determine if a procedure is the most common? Do you sum up the number of procedures in a given year, and then rank? Or do you find the most common procedure at each hospital?

procedures_count <- nyc_medicare_2017 %>%
  group_by(`DRG Definition`) %>%
  summarise(discharge_count = sum(`Total Discharges`)) %>%
  arrange(desc(discharge_count))
head(procedures_count, n = 10)
## # A tibble: 10 x 2
##    `DRG Definition`                                              discharge_count
##    <chr>                                                                   <dbl>
##  1 871 - SEPTICEMIA OR SEVERE SEPSIS W/O MV >96 HOURS W MCC                10278
##  2 470 - MAJOR JOINT REPLACEMENT OR REATTACHMENT OF LOWER EXTRE~            8317
##  3 291 - HEART FAILURE & SHOCK W MCC                                        5879
##  4 392 - ESOPHAGITIS, GASTROENT & MISC DIGEST DISORDERS W/O MCC             2828
##  5 312 - SYNCOPE & COLLAPSE                                                 2613
##  6 690 - KIDNEY & URINARY TRACT INFECTIONS W/O MCC                          2482
##  7 872 - SEPTICEMIA OR SEVERE SEPSIS W/O MV >96 HOURS W/O MCC               2324
##  8 190 - CHRONIC OBSTRUCTIVE PULMONARY DISEASE W MCC                        2003
##  9 683 - RENAL FAILURE W CC                                                 1874
## 10 378 - G.I. HEMORRHAGE W CC                                               1830

So, if we look at the 10 most common procedures by the total number of discharges, septicemia/sepsis treatments are leading the pack.

nyc_medicare_2017_ten_common <- nyc_medicare_2017 %>%
  filter(`DRG Definition` %in% unlist(procedures_count$`DRG Definition`)[1:10]) %>%
  select(`Provider Name`, `DRG Definition`, `Total Discharges`, `Average Total Payments`, borough, health_plus) %>%
  mutate(DRG_num = as.character(substr(`DRG Definition`, 1, 3)))
cost_boxplot <- ggplot(nyc_medicare_2017_ten_common, aes(DRG_num, `Average Total Payments`)) + 
  geom_boxplot() + 
  labs(title = "Box Plot of Average Total Payments by DRG",
       subtitle = "Each observation is the averal total payments attributed to each hospital",
       x = "DRG Number",
       y = "Average Total Payments")
cost_boxplot

Okay, so there is a huge variance between hospitals for each of the ten most common procedures. Are there hospitals that are consistently above or below average?

nyc_medicare_2017_ten_common_zscores <- nyc_medicare_2017_ten_common %>%
  group_by(DRG_num) %>%
  mutate(mean_total_payment = mean(`Average Total Payments`),
         sd_total_payment = sd(`Average Total Payments`)) %>%
  ungroup() %>%
  mutate(z_score = (`Average Total Payments`-mean_total_payment)/sd_total_payment)
hospital_scatter <- ggplot(nyc_medicare_2017_ten_common_zscores, aes(`Provider Name`, z_score, label = DRG_num)) + 
  geom_point(aes(col = health_plus)) + 
  geom_hline(yintercept = 0) +
  coord_flip() + 
  labs(title = "Z-score Scatterplot",
       subtitle = "No. of std dev for each common DRG by hospital",
       y = "Standard Deviations",
       x = "Hospital")
ggplotly(hospital_scatter)

Scandal! All NYC Health-Plus, ie public hospitals, charge more than private ones, with the exception of Coney Island Hospital.